Problem Representation in Python, and N-Queens, Revisited
In this first-ever blog post, I want to take the opportunity to talk a little bit about representing problems — a process any CS student must first fully internalize before advancing to complex problem solving. To seasoned problem-solvers, the process of problem representation is so natural that us onlookers are quick to take it for granted. In this blog post, I'd like to look at three case studies — two are puzzles from the mathematical gold mine The PuzzlOR, and the third is the infamous N-Queens problem.
Enumeration and Exhaustion
Problem 1 — Toy Builder
Imagine you are a toy builder who can build three different toys — airplanes, helicopters, or cars. Each of these toys requires resources to make, namely blue blocks, green rods, or red wheels. Below is a table that summarizes the cost for each toy:
Once each toy is crafted, they can be sold (presumably on the luxury toy market) to net the following profits:
You begin with 25 blue blocks, 29 green rods, and 30 red wheels. The question is, of course,
What is the maximum profit you can achieve?
You do not have to exhaust your resources. At this point if you have an itch to try your hand at solving the puzzle, feel free to come back and compare approaches.
As the search space for this problem is quite restricted, one approach is to just enumerate over all possibilities, i.e., brute force. This cut-and-dried approach is (potentially very) expensive, but is less costly on us humans to implement in favor of something like a convoluted backtracking algorithm.
When implementing brute-force, there are generally three steps: (1) defining the representation of a candidate solution, (2) generating all possibilities, and (3) finding the candidate that solves our problem. "Solving the problem" means fitting a criteria, in this case the maximum cost-generating combination of toys to build.
Representation
This problem is relatively easy to represent — we can define a \(1\times3\) vector to hold the number of airplanes, helicopters, and cars made, respectively. I.e.,
is our representation of a candidate solution.
Generating possibilities
First, we need to limit each toy produced in a given solution by the maximum number we can make of that toy. This will be given by the floor of the minimum amount of toys you can make with any of the resources available, or
where \(c^{(i)}_{b,g,r}\) is the cost in blue/green/red resources of toy \(i\). We can query these results
starting_res = np.array([25, 29, 30])
toy_costs = np.array([[3, 2, 1], [2, 4, 1], [1, 2, 4]])
[np.floor(np.min(starting_res/toy_costs[i])) for i in range(3)]
[8.0, 7.0, 7.0]
to discover we can create a maximum of \(8\) airplanes, \(7\) helicopters, and \(7\) cars. To enumerate all possibilities, we can use a nested for loop to create each vector solution with the maximums in place. We can also cut down on the number of solutions to sift through in the future by axing solutions we know are wrong (if they use excess resources). We can write a function:
def toy_valid(sol):
i, j, k = sol[0], sol[1], sol[2]
cur_resources = np.array([25, 29, 30]) # initial
cur_resources -= np.array([3, 2, 1])*i + \
np.array([2, 4, 1])*j + \
np.array([1, 2, 4])*k # using resources
if np.all(cur_resources >= 0):
return True # valid solution
return False
that ensures after spending all required resources, we didn't over-use any.
Identifying the solution
Next, we can enumerate the solutions and accumulate the 'correct' ones with the \(O(n^3)\) for loop:
sols = []
for i in range(9): # airplanes
for j in range(8): # helicopters
for z in range(8): # cars
toy_eval = toy_valid([i, j, z])
if toy_eval: # if valid solution
sols += [[i, j, z]]
Let's check how many 'correct' solutions there are:
len(sols)
246
def toy_cost(sol):
return 7*sol[0] + 8*sol[1] + 5*sol[2]
max_cost = 0
cur_sol = None
for sol in sols:
if toy_cost(sol) > max_cost:
max_cost = toy_cost(sol)
cur_sol = sol
elif toy_cost(sol) == max_cost:
sim_sols += [sol]
max_cost, list(cur_sol)
(76, [5, 2, 5])
Revealing our profit-maximizing toy distribution to be 5 airplanes, 2 helicopters, and 5 cars, generating $76 profit in total.
Remember, this deterministic approach was only really used because we knew the number of solutions to check was low. If we increased the amount of starting resources, we'd increase the search space combinatorially (possibly leading to a combinatorial explosion, which, like most kinds of explosions, aren't fun).
Problem 2 — Port in a Storm
This is the (sadly apparent) last published puzzle from the great PuzzlOR, where it is your job to direct 20 boats at sea to their nearest dock (with a total of 20 dock spaces) in the midst of a storm.

There are three distinct clusters of boats (labelled above), and we can summarize them as follows:
And the dock spaces are:
Given the data, the question this time is:
What is the minimum total distance travelled by all boats?
Representation
With this puzzle, it is harder to see how we would go about abstracting a solution to a data structure. Several ideas are possible, some more efficient than others. Take for instance each boat represented by a list with relevant attributes:
where \(\text{dist}_i\) is the distance of the boat to the \(i\)-th dock.
This representation is clear and allows querying of the boat's travelled distance, or cost via \(\verb|boat|_i\verb|[0][1]|\). However, we must consider that twenty boats in an array means \(20\cdot4=100\) total elements in memory, and it's easy to see that many boats are essentially the same array, with the current_dock being the only altered attribute.
Okay, what about a different data structure? Since current_dock is the only variable attribute, can we encode that data without constantly altering it in some fashion? One possible representation is to transform current_dock a position in an array:
where \(\text{docks}[i]\) stores all boats docked at the \(i\)-th dock. In this representation, boats would change destinations via array swapping, which is possibly even more costly given the nature of array concatenation and slicing.
As explicit as this representation is, it still suffers from the redundancy issues of the boat representation, and its practical implementation must constantly ensure \(\text{len}(\text{docks}[i])\le\text{dock_spaces}_i\), which is at least three comparisons per iteration.
Let's move on from this nested array approach. In a single array, can we somehow encode the assigned dock of a boat and identify its cluster at the same time? Unsurprisingly, the answer is yes — probably in many ways.
Let's move forward with a representation that makes use of the position in the array to signify dock number, and integers 1–3 to signify boat cluster. Here is a sample solution that assigns boats to each dock in sorted order:

Functionally, in this array there are three 'hidden' sets — dock one, two, and three. Within these sets, order does not matter (this will drastically cut down on the search space later), and while we do have repeated elements, we can use these to index cost-distance arrays when evaluating a solution.
Generating possibilities
We can think of our representation as a result of choosing different individual boats from a pool of available boats, one dock at a time. We can define the algorithm for generating a given solution as
- Define a pool of available boats, \(S\)
- Choose \(4\) boats from \(S\), let this be the set \(D_1\)
- Choose \(7\) boats from \(\{S-D_1\}\), let this be the set \(D_2\)
- The remaining \(9\) boats in dock three will be \(D_3=\{S-D_1-D_2\}\)
Of course, the real algorithm must enumerate all choices of (2) and then (3).
In Python, we can define our pool and algorithm as such:
boat_nums = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3]
sols = []
for first_dock in list(itertools.combinations_with_replacement([1, 2, 3], 4)):
# for every first-dock configuration
current = list.copy(boat_nums)
for boat in first_dock: # remove from possible further boats
current.remove(boat)
# for each dock 2 config
for second_dock in list(itertools.combinations(current, 7)):
all_left = list.copy(current)
for boat in second_dock: # find remaining boats to alight at third dock
all_left.remove(boat)
sols += [list(first_dock) + list(second_dock) + all_left] # append solution
Let's check how many solutions we have in total:
len(sols)
171600
Which is, again, not that many.
Identifying the solution
A cost function for the problem given our representation is easy — we simply iterate over each boat in each dock slice, applying the appropriate costs as distances:
costs = [[1, 2, 3], [3, 2, 2], [2, 1, 1]]
def distance_cost(s):
dist = 0
for boat in s[:4]: # first dock
dist += costs[0][boat-1]
for boat in s[4:11]: # second dock
dist += costs[1][boat-1]
for boat in s[11:]: # third dock
dist += costs[2][boat-1]
return dist
We can then iterate over all solutions and look for the minimum cost/distance travelled by a configuration of boats in docks:
min_dist = 999
cur_sol = 0
for sol in sols:
if distance_cost(sol) < min_dist:
cur_sol = sol
min_dist = distance_cost(sol)
min_dist, cur_sol # solution
(31, [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3])
And we find the minimum distance to be 31, and it just so happened to be that the minimum distance yielding configuration was our default case!
Exhaustion due to Enumeration
Of course, we should note the only reason we could feasibly solve the previous problems with brute-force was because of the small number of solutions to verify, and the guarantee our parameters/arguments were fixed.
But what happens when our problem is more general, or simply has too many possibilities to check?